PERF: further improve take (reindex/unstack) for ArrayManager #40300

jorisvandenbossche · 2021-03-08T09:16:06Z

A few related changes to take_1d / ArrayManager.unstack:

Check in ArrayManager.unstack whether there will be missing values introduced or not, and if not, pass allow_fill=False to the take function, so this can take a more performant path (and doesn't need to check repeatedly for indexer == -1). And the same for ArrayManager.reindex_indexer.
In the take_1d, if allow_fill=False, use the numpy take instead of our own cython implementation (numpy is a bit faster).
Move the ensure_int64 check for the indexer out of the shared _take_preprocess_indexer_and_fill_value. This way, we can do this once in ArrayManager.unstack and avoid doing it repeatedly for the same indexer inside take_1d.

Case from the frame_methods.py::Unstack.time_full_product ASV benchmark:

dtype = 'int'
m = 100
n = 1000

levels = np.arange(m)
index = pd.MultiIndex.from_product([levels] * 2)
columns = np.arange(n)
values = np.arange(m * m * n).reshape(m * m, n)

df = pd.DataFrame(values, index, columns)
df_am = df._as_manager("array")

with the combined changes, this gives:

In [2]: %timeit df_am.unstack()
1.09 s ± 68.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <-- master
276 ms ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <-- PR

jorisvandenbossche · 2021-03-08T09:17:14Z

pandas/core/array_algos/take.py

+        # check for promotion based on types only (do this first because
+        # it's faster than computing a mask)
+        dtype, fill_value = maybe_promote(arr.dtype, fill_value)
+        if dtype != arr.dtype and (out is None or out.dtype != dtype):
+            # check if promotion is actually required based on indexer
+            mask = indexer == -1
+            needs_masking = mask.any()
+            mask_info = mask, needs_masking
+            if needs_masking:
+                if out is not None and out.dtype != dtype:
+                    raise TypeError("Incompatible type for fill_value")
+            else:
+                # if not, then depromote, set fill_value to dummy
+                # (it won't be used but we don't want the cython code
+                # to crash when trying to cast it to dtype)
+                dtype, fill_value = arr.dtype, arr.dtype.type()


Nothing changed in those lines of the code; it's only a change in indentation (which github does't seem to detect nicely).

Only the first lines:

if indexer is None: indexer = np.arange(arr.shape[axis], dtype=np.int64) dtype, fill_value = arr.dtype, arr.dtype.type() else: indexer = ensure_int64(indexer, copy=False)

have been moved out into _take_nd_ndarray

jorisvandenbossche · 2021-03-08T09:18:01Z

pandas/core/array_algos/take.py

+    if indexer is None:
+        indexer = np.arange(arr.shape[axis], dtype=np.int64)
+        dtype, fill_value = arr.dtype, arr.dtype.type()
+    else:
+        indexer = ensure_int64(indexer, copy=False)


This part is copied from _take_preprocess_indexer_and_fill_value, so we can avoid doing that in take_1d

pandas/core/array_algos/take.py

jreback · 2021-03-08T13:10:36Z

pandas/core/internals/array_manager.py

@@ -999,11 +1000,16 @@ def _reindex_indexer(

        else:
            validate_indices(indexer, len(self._axes[0]))
+            indexer = ensure_int64(indexer)


why are you converting to int64? we use intp for indexing via take

That's existing code (just cut and paste). And our cython take algos always use int64:

pandas/pandas/_libs/algos_take_helper.pxi.in

Line 69 in d910a44

const int64_t[:] indexer,

should they though? i expect they ought to be intp_t

im increasingly confident that intp_t is the correct thing to do, but its out of scope for this PR, xref #40390

jbrockmendel · 2021-03-08T18:58:03Z

pandas/core/internals/array_manager.py

+            if (indexer == -1).any():
+                allow_fill = True
+            else:
+                allow_fill = False


@jorisvandenbossche in looking at the benchmarks you posted in #39146 it seems like some of them involve optimizations implemented for AM that could be implemented for BM but weren't. is there anything here that falls into that category?

in looking at the benchmarks you posted in #39146 it seems like some of them involve optimizations implemented for AM that could be implemented for BM but weren't

Are you thinking of something specific?

The optimization I do here above can probably be useful if you have lots of non-consolidated blocks, but for the benchmark cases with single blocks posted in #39146, that wouldn't make a difference.

Series has an algos.take_nd call i think could be take_1d. Also several of the EA.take implementations

jorisvandenbossche · 2021-03-10T14:33:55Z

Are there further comments here?

pandas/core/array_algos/take.py

jbrockmendel · 2021-03-10T15:27:23Z

pandas/core/array_algos/take.py

    indexer, dtype, fill_value, mask_info = _take_preprocess_indexer_and_fill_value(
-        arr, indexer, 0, None, fill_value, allow_fill
+        arr, indexer, None, fill_value, allow_fill


do we ever get to _take_preprocess_indexer_and_fill_value with allow_fill=False?

Yes, that's currently possible in take_nd, and a quick search reveals that we have a few cases throughout pandas that calls take_nd specifying this argument. That's probably something to optimize though.

jreback · 2021-03-10T15:39:29Z

yeah i think generally we should be using intp for indexing as we
do this for numpy

we should change our cython routines to do this (this maybe a non trivial ask)

it is likely this is a perf bottleneck (as we are moving between intp and int64 a fair amount)

i would be wrong though and this is not really an issue

jreback · 2021-03-11T23:43:50Z

checks are failing (maybe just rebase needed)

jbrockmendel · 2021-03-11T23:55:57Z

it is likely this is a perf bottleneck (as we are moving between intp and int64 a fair amount)

arent these aliases on 64bit builds?

jbrockmendel · 2021-03-12T00:03:56Z

Can/should we just have take_nd do an earlier check for ndim == 1 and then dispatch to take_1d?

In NDArrayBackedExtensionArray.take if i change take_nd to

        if self.ndim == 1 and axis == 0 and allow_fill:
            indices = ensure_platform_int(indices)
            validate_indices(indices, len(self))
            new_data = take_1d(
                self._ndarray, indices, allow_fill=allow_fill, fill_value=fill_value
            )

        else:
             new_data = [what we do right now]

then on the time_full_product benchmark referenced #39146 (comment) we shave off 25-30% relative to master (non-AM)

jorisvandenbossche · 2021-03-12T15:38:00Z

Can/should we just have take_nd do an earlier check for ndim == 1 and then dispatch to take_1d?

That might be possible (although the check for EA and np.asarray might get a bit in the way for that). But we would still have the separate take_1d implemention, so I personally prefer to keep using that for explicitness in those places where we know it's only about 1D arrays (like the places in ArrayManager where I am using it now).

jbrockmendel · 2021-03-12T17:22:36Z

so I personally prefer to keep using that for explicitness in those places where we know it's only about 1D arrays (like the places in ArrayManager where I am using it now).

im fine with that, just suggesting if we have a clear optimization for the general take lets use it

jorisvandenbossche · 2021-03-12T18:53:17Z

im fine with that, just suggesting if we have a clear optimization for the general take lets use it

Certainly, see #40405

jorisvandenbossche · 2021-03-12T18:53:59Z

For the rest, any remaining comments before merging this?

jbrockmendel · 2021-03-12T20:36:50Z

ill take another look this afternoon

jbrockmendel · 2021-03-12T21:12:05Z

pandas/core/internals/array_manager.py

        new_indexer2D = new_indexer.reshape(*unstacker.full_shape)
+        new_indexer2D = ensure_int64(new_indexer2D)


can this chunk of code be made to match Block._unstack (or vice-versa) more closely?

e.g. on L1094 you check unstacker.mask.all() which I think is related to the comment in Block._unstack # TODO: in all tests we have mask.all(); can we rely on that? but bc mask is defined there differently it is not obvious

Assuming you are referring to the EA-specific Block.unstack (as the other one has even less in common): that's for a single column while this is for multiple columns at once, and needs to deal with creating blocks, with placement, etc. So personally I don't see much possibilities to share code.

e.g. on L1094 you check unstacker.mask.all() which I think is related to the comment in Block._unstack

As far as I understand the Block version, I don't think that's related. For Block/BlockManager.unstack, that mask value somehow indicates which of the columns need to end up in the result (I don't really understand that code, though, as in the ArrayManager I am simply using the result of unstacker.get_new_columns without any filtering of that afterwards), while here I only check the mask to potentially use a faster take for performance-reasons, but that doesn't influence the end result.

I don't think that's related

OK.

(I don't really understand that code, though

at some point we should make sure that one of us (active maintainers, not just the two of us) have a handle on this.

jreback · 2021-03-12T21:49:34Z

lgtm @jbrockmendel merge when ready

jbrockmendel · 2021-03-13T23:48:47Z

LGTM. Merging this evening absent other comments.

jreback · 2021-03-15T00:30:50Z

thanks @jorisvandenbossche

…-dev#40300)

PERF: further improve take (reindex/unstack) for ArrayManager

072118a

jorisvandenbossche added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Mar 8, 2021

jorisvandenbossche commented Mar 8, 2021

View reviewed changes

jorisvandenbossche added 2 commits March 8, 2021 10:30

add check for reindex_indexer axis=0 as well

85cccb2

typing

f8adf7b

jreback requested changes Mar 8, 2021

View reviewed changes

jbrockmendel reviewed Mar 8, 2021

View reviewed changes

jreback requested changes Mar 10, 2021

View reviewed changes

pandas/core/array_algos/take.py Show resolved Hide resolved

jbrockmendel reviewed Mar 10, 2021

View reviewed changes

pandas/core/array_algos/take.py Show resolved Hide resolved

jbrockmendel reviewed Mar 10, 2021

View reviewed changes

pandas/core/array_algos/take.py Show resolved Hide resolved

jbrockmendel reviewed Mar 10, 2021

View reviewed changes

jorisvandenbossche added 3 commits March 11, 2021 11:41

Merge remote-tracking branch 'upstream/master' into am-perf-take-2

cb76a91

add note about validated indexer

5768466

Merge remote-tracking branch 'upstream/master' into am-perf-take-2

f85ebea

jorisvandenbossche added 3 commits March 12, 2021 10:41

Merge remote-tracking branch 'upstream/master' into am-perf-take-2

53fef41

Merge remote-tracking branch 'upstream/master' into am-perf-take-2

12adf4e

remove typing ignores

35f933f

jorisvandenbossche mentioned this pull request Mar 12, 2021

Use take_1d in take_nd #40405

Closed

jbrockmendel reviewed Mar 12, 2021

View reviewed changes

jreback added this to the 1.3 milestone Mar 12, 2021

jreback approved these changes Mar 12, 2021

View reviewed changes

jreback merged commit 1ced878 into pandas-dev:master Mar 15, 2021

jorisvandenbossche deleted the am-perf-take-2 branch March 15, 2021 07:24

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

PERF: further improve take (reindex/unstack) for ArrayManager (pandas…

6776764

…-dev#40300)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: further improve take (reindex/unstack) for ArrayManager #40300

PERF: further improve take (reindex/unstack) for ArrayManager #40300

jorisvandenbossche commented Mar 8, 2021 •

edited

Loading

jorisvandenbossche Mar 8, 2021

jorisvandenbossche Mar 8, 2021

jreback Mar 8, 2021

jorisvandenbossche Mar 8, 2021

jbrockmendel Mar 10, 2021

jbrockmendel Mar 12, 2021

jbrockmendel Mar 8, 2021

jorisvandenbossche Mar 8, 2021

jbrockmendel Mar 10, 2021

jorisvandenbossche commented Mar 10, 2021

jbrockmendel Mar 10, 2021

jorisvandenbossche Mar 11, 2021

jreback commented Mar 10, 2021

jreback commented Mar 11, 2021

jbrockmendel commented Mar 11, 2021

jbrockmendel commented Mar 12, 2021

jorisvandenbossche commented Mar 12, 2021

jbrockmendel commented Mar 12, 2021

jorisvandenbossche commented Mar 12, 2021

jorisvandenbossche commented Mar 12, 2021

jbrockmendel commented Mar 12, 2021

jbrockmendel Mar 12, 2021

jorisvandenbossche Mar 12, 2021

jbrockmendel Mar 13, 2021

jreback commented Mar 12, 2021

jbrockmendel commented Mar 13, 2021

jreback commented Mar 15, 2021

		new_indexer2D = new_indexer.reshape(*unstacker.full_shape)
		new_indexer2D = ensure_int64(new_indexer2D)

PERF: further improve take (reindex/unstack) for ArrayManager #40300

PERF: further improve take (reindex/unstack) for ArrayManager #40300

Conversation

jorisvandenbossche commented Mar 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Mar 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 10, 2021

jreback commented Mar 11, 2021

jbrockmendel commented Mar 11, 2021

jbrockmendel commented Mar 12, 2021

jorisvandenbossche commented Mar 12, 2021

jbrockmendel commented Mar 12, 2021

jorisvandenbossche commented Mar 12, 2021

jorisvandenbossche commented Mar 12, 2021

jbrockmendel commented Mar 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 12, 2021

jbrockmendel commented Mar 13, 2021

jreback commented Mar 15, 2021

jorisvandenbossche commented Mar 8, 2021 •

edited

Loading